133 research outputs found

    Shape Dimension and Intrinsic Metric from Samples of Manifolds

    Get PDF
    We introduce the adaptive neighborhood graph as a data structure for modeling a smooth manifold M embedded in some Euclidean space Rd. We assume that M is known to us only through a finite sample P \subset M, as is often the case in applications. The adaptive neighborhood graph is a geometric graph on P. Its complexity is at most \min{2^{O(k)n, n2}, where n = |P| and k = dim M, as opposed to the n\lceil d/2 \rceil complexity of the Delaunay triangulation, which is often used to model manifolds. We prove that we can correctly infer the connected components and the dimension of M from the adaptive neighborhood graph provided a certain standard sampling condition is fulfilled. The running time of the dimension detection algorithm is d2O(k^{7} log k) for each connected component of M. If the dimension is considered constant, this is a constant-time operation, and the adaptive neighborhood graph is of linear size. Moreover, the exponential dependence of the constants is only on the intrinsic dimension k, not on the ambient dimension d. This is of particular interest if the co-dimension is high, i.e., if k is much smaller than d, as is the case in many applications. The adaptive neighborhood graph also allows us to approximate the geodesic distances between the points in

    Fast and Robust Vectorized In-Place Sorting of Primitive Types

    Get PDF
    Modern CPUs provide single instruction-multiple data (SIMD) instructions. SIMD instructions process several elements of a primitive data type simultaneously in fixed-size vectors. Classical sorting algorithms are not directly expressible in SIMD instructions. Accelerating sorting algorithms with SIMD instruction is therefore a creative endeavor. A promising approach for sorting with SIMD instructions is to use sorting networks for small arrays and Quicksort for large arrays. In this paper we improve vectorization techniques for sorting networks and Quicksort. In particular, we show how to use the full capacity of vector registers in sorting networks and how to make vectorized Quicksort robust with respect to different key distributions. To demonstrate the performance of our techniques we implement an in-place hybrid sorting algorithm for the data type int with AVX2 intrinsics. Our implementation is at least 30% faster than state-of-the-art high-performance sorting alternatives

    Lumen: A software for the interactive visualization of probabilistic models together with data

    Get PDF
    Research in machine learning and applied statistics has led to the development of a plethora of different types of models. Lumen aims to make a particular yet broad class of models, namely, probabilistic models, more easily accessible to humans. Lumen does so by providing an interactive web application for the visual exploration, comparison, and validation of probabilistic models together with underlying data. As the main feature of Lumen a user can rapidly and incrementally build flexible and potentially complex interactive visualizations of both the probabilistic model and the data that the model was trained on. Many classic machine learning methods learn models that predict the value of some target variable(s) given the value of some input variable(s). Probabilistic models go beyond this point estimation by predicting instead of a particular value a probability distribution over the target variable(s). This allows, for instance, to estimate the prediction’s uncertainty, a highly relevant quantity. For a demonstrative example consider a model predicts that an image of a suspicious skin area does not show a malignant tumor. Here it would be extremely valuable to additionally know whether the model is sure to 99.99% or just 51%, that is, to know the uncertainty in the model’s prediction. Lumen is build on top of the modelbase back-end, which provides a SQL-like interface for querying models and its data (Lucas, 2020)

    GENO -- GENeric Optimization for Classical Machine Learning

    Full text link
    Although optimization is the longstanding algorithmic backbone of machine learning, new models still require the time-consuming implementation of new solvers. As a result, there are thousands of implementations of optimization algorithms for machine learning problems. A natural question is, if it is always necessary to implement a new solver, or if there is one algorithm that is sufficient for most models. Common belief suggests that such a one-algorithm-fits-all approach cannot work, because this algorithm cannot exploit model specific structure and thus cannot be efficient and robust on a wide variety of problems. Here, we challenge this common belief. We have designed and implemented the optimization framework GENO (GENeric Optimization) that combines a modeling language with a generic solver. GENO generates a solver from the declarative specification of an optimization problem class. The framework is flexible enough to encompass most of the classical machine learning problems. We show on a wide variety of classical but also some recently suggested problems that the automatically generated solvers are (1) as efficient as well-engineered specialized solvers, (2) more efficient by a decent margin than recent state-of-the-art solvers, and (3) orders of magnitude more efficient than classical modeling language plus solver approaches

    The conformal alpha shape filtration

    Get PDF
    Conformal alpha shapes are a new filtration of the Delaunay triangulation of a finite set of points in â„ťd. In contrast to (ordinary) alpha shapes the new filtration is parameterized by a local scale parameter instead of the global scale parameter in alpha shapes. The local scale parameter conforms to the local geometry and is motivated from applications and previous algorithms in surface reconstruction. We show how conformal alpha shapes can be used for surface reconstruction of non-uniformly sampled surfaces, which is not possible with alpha shape

    Delaunay Triangulation Based Surface Reconstruction: Ideas and Algorithms

    Get PDF
    Given a finite sampling P\subset\mathbbR^d of an unknown surface SS, surface reconstruction is concerned with the calculation of a model of SS from PP. The model can be represented as a smooth or a triangulated surface, and is expected to match SS from a topological and geometric standpoints. In this survey, we focus on the recent developments of Delaunay based surface reconstruction methods, which were the first methods (and in a sense still the only ones) for which one can precisely state properties of the reconstructed surface. We outline the foundations of these methods from a geometric and algorithmic standpoints. In particular, a careful presentation of the hypothesis used by these algorithms sheds light on the intrinsic difficulties of the surface reconstruction problem faced by any method, Delaunay based or not

    Spectral Techniques to Explore Point Clouds in Euclidean Space, with Applications to Collective Coordinates in Structural Biology

    Get PDF
    International audienceLife sciences, engineering, or telecommunications provide numerous systems whose description requires a large number of variables. Developing insights into such systems, forecasting their evolution, or monitoring them is often based on the inference of correlations between these variables. Given a collection of points describing states of the system, questions such as inferring the effective number of independent parameters of the system (its intrinsic dimensionality) and the way these are coupled are paramount to develop models. In this context, this paper makes two contributions. First, we review recent work on spectral techniques to organize point clouds in Euclidean space, with emphasis on the main difficulties faced. Second, after a careful presentation of the bio-physical context, we present applications of dimensionality reduction techniques to a core problem in structural biology, namely protein folding. Both from the computer science and the structural biology perspective, we expect this survey to shed new light on the importance of non linear computational geometry in geometric data analysis in general, and for protein folding in particular
    • …
    corecore